Memory advantage for untrustworthy faces: Replication across lab- and web-based studies

The Covid-19 pandemic imposed new constraints on empirical research and forced researchers to transfer from traditional laboratory research to the online environment. This study tested the validity of a web-based episodic memory paradigm by comparing participants’ memory performance for trustworthy and untrustworthy facial stimuli in a supervised laboratory setting and an unsupervised web setting. Consistent with previous results, we observed enhanced episodic memory for untrustworthy compared to trustworthy faces. Most importantly, this memory bias was comparable in the online and the laboratory experiment, suggesting that web-based procedures are a promising tool for memory research.


Introduction
The Covid-19 pandemic has caused significant disruptions to all aspects of life and work. Due to the need to reduce social contact, the work environment has suffered a radical transformation. Scientific research has not been an exception and many researchers, particularly in the field of psychology and neuroscience, have been forced to transfer lab-based behavioral research to the online environment [1,2]. Although web-based research has some inherent limitations due to lack of experimental control and potential technical challenges (e.g., variations in internet speed and display settings), as well as unknown participant behavior (through anonymous and unsupervised participation), it also has some advantages over traditional laboratory settings: it allows the recruitment of large and diverse samples of participants in terms of age, gender, origin, culture and social status, minimizes organizational issues such as scheduling conflicts and time constraints, eliminates potential experimenter effects, and reduces costs related to laboratory space, personnel hours, equipment, and administration [3][4][5][6][7][8][9][10].
Recent studies also indicated that online experiments show comparable results to those conducted in the laboratory environment [11][12][13][14][15][16][17][18]. For instance, Crump and colleagues [14] examined the similarities between lab-and web-based settings in a series of behavioral experiments, including Stroop, Switching, Flanker, and Simon tasks. The authors observed that the web-based environment replicated the experimental standard effects found in a traditional laboratory setting [14]. However, the authors also addressed disparities between lab-based and web-based research, possibly due to timing differences of participants' web browsers and other technical challenges. While web-based experimental procedures allow for efficient data collection with results comparable to those of laboratory experiments, there is still reason for caution, and the validity of web settings needs to be further empirically determined. Therefore, the primary goal of the present study was to test the reliability of web-based tests in experimental psychology and related domains further. We focused on an episodic memory task (recognition memory), which has been conducted both online and in a laboratory setting. Deficits in episodic memory, i.e., the process through which details about previous experiences and events are stored, have been associated with various neurodegenerative and psychological disorders (e.g., anxiety disorders; for review see [19]). Examining whether such a relevant process can be reliably evaluated in web settings may have a positive impact on the detection of mnemonic dysfunctionalities and may facilitate the implementation of potential intervention programs. Thus, we examined the similarities between online and lab settings in the episodic memory performance. Both lab-and web-based experiments followed an identical protocol, in which neutral facial expressions differing in trustworthiness were encoded (i.e., free picture viewing procedure) and immediately retrieved (i.e., recognition memory procedures). Prior research found that untrustworthy faces are better remembered than trustworthy faces [20-23], a well-documented memory advantage that might serve an adaptive purpose to avoid potentially harmful social interactions [24]. If episodic memory processes can reliably be measured in web settings, the online and lab samples should show comparable levels of memory accuracy. In addition, the expected trustworthy effect (enhanced memory for untrustworthy faces) should be comparable in both samples. Furthermore, earlier studies showed mixed results on the interplay between episodic memory and anxiety disorders such as social anxiety (for review see [19,25]. Thus, we also collected social anxiety scores from participants in the web sample as an efficient add-on to explore the influence of individual differences in social anxiety on memory performance for untrustworthy and trustworthy faces.

Participants
A total of 33 students (30 female, 31 right-handed, M age = 20.61 years, SD age = 2.42 years) from the University of Greifswald participated in the lab study in exchange for course credits (for EEG results related to the encoding session see [26]) and 111 participants (87 female, 100 right-handed, M age = 24.39 years, SD age = 5.07 years) from the University of Potsdam completed the web study in exchange of course credits. All participants provided written-informed consent for a study procedure, which was approved by the ethics committee of the German Society for Psychology (DGPs) and the University of Potsdam and carried out in accordance with the Declaration of Helsinki. Before data analysis, the data was closely inspected to check that all participants executed the recognition memory task as instructed (e.g., by excluding participants who randomly guessed during their old/new judgement as indicated by Pr values equalling 0). After data inspection, no participants were excluded based on their overall Pr index, suggesting that participants in the unsupervised online environment took the study as seriously as in the supervised environment in the laboratory. However, seven participants from the web sample were excluded from the analyses (N = 3 were not students, N = 4 exceeded the overall completion time using the interquartile range criterion), leaving a final sample of 104 students (83 female, 94 right-handed, M age = 24.08 years, SD age = 4.91 years) in the web sample.

Procedure
Both the lab study and the web study followed an identical protocol and consisted of two experimental sessions: an incidental encoding session and a recognition memory session, which took place immediately after the encoding session. The stimulus material consisted of 120 neutral Caucasian faces with direct gaze, which were previously evaluated as trustworthy (30 female, 30 male) and untrustworthy (30 female, 30 male) (same stimuli as in [22,23]; [26] for details about stimulus construction and evaluation). The faces were converted into greyscales, position, and luminance and surrounded by an elliptic mask to minimize the influence of expression-irrelevant features on face perception during task (c.f., [22,23,26]).
During encoding, participants viewed a total of 60 neutral faces (30 trustworthy, 30 untrustworthy) presented in pseudorandom order. They were instructed to pay attention to the faces but neither informed that the faces differed in trustworthiness nor that a recognition test would follow (incidental encoding). Each trial began with a fixation cross presented for an interval that varied randomly between 1500 ms and 3000 ms, followed by a face presented once for 3000 ms. Directly after the free-viewing task, participants performed the recognition memory task. During the recognition memory task the previously seen neutral faces were presented intermixed with 60 new, i.e., not seen during encoding, neutral faces. Participants saw one face at a time (for 3000 ms) and were instructed to indicate (after the question old/new was presented) whether they had previously seen the stimulus during encoding (old face) or not (new face) by pressing the corresponding key on a keyboard (lab sample) or by clicking on the corresponding response field on the screen using their mouse or trackpad (web sample). The position of the response field on the screen was counterbalanced across participants, as were the response buttons on the keyboard in the lab environment. Following the old/new judgement, participants rated their memory confidence by pressing (lab sample) or clicking (web sample) on the corresponding percentage on a Likert scale ranging from 0% (i.e., not confident) to 100% (i.e., absolutely confident).
In the web sample, after the recognition session, participants completed the Liebowitz Social Anxiety Scale (LSAS; [27, 28]), which is a good proxy of the presence of social anxiety.
All stimuli in the lab setting were presented on a 27-in monitor (1920x1080 pixel) using Pre-sentation1 (Neurobehavioral Systems, Berkeley, CA), while participants were seated in a comfortable upholstered chair. The online experiment was implemented using the PsyToolkit platform [29,30] and was automatically run on full-screen mode on modern web browsers (Mozilla Firefox, Microsoft Edge, and Google Chrome). The experiment was not compatible with Apple Safari browsers and did not run on a tablet or smartphone. Participants most frequently used a laptop with trackpad to complete the experiment (N laptop,trackpad = 83, N laptop,mouse = 10, N PC = 11).

Statistical analysis
To evaluate behavioral performance in both studies, the discrimination index Pr, p(H) − p (FA), and bias index Br, pðFAÞ pð1À PrÞ , were calculated overall and for trustworthy and untrustworthy faces, separately [31]. Higher Pr values are generally associated with better memory discrimination. Br values greater than 0.5 indicate a liberal response bias (i.e., bias to respond old), whereas lower values indicate a conservative response bias. We applied a two-sample Kolmogorov-Smirnov test to check for differences in Pr and Br distributions between the lab and the web samples [32]. To determine whether both experimental procedures differed in the recognition memory task and to replicate the trustworthy effects (i.e., enhanced memory for untrustworthy compared to trustworthy faces) in both samples, Pr and Br were further analyzed using a repeated-measures ANOVA with the within-subject factor Trustworthiness (trustworthy, untrustworthy) and the between-subject factor Sample (lab sample, web sample). For confidence ratings, a 3x2 ANOVA was performed using the factors Memory (hits, false alarms), Trustworthiness (trustworthy, untrustworthy), and Sample (lab sample, web sample). We further performed an exploratory analysis to evaluate potential gender differences in memory performance, independent of the task environment. To do so, all 24 male participants from both samples were pooled together with 24 randomly selected female participants from both samples that were matched in age. For Pr, we performed an ANOVA using this subsample (N = 48) with the within-subject factor Trustworthiness (trustworthy, untrustworthy) and the between-subject factor Gender (female, male). Correlational analysis was further performed to test trustworthiness-specific associations between memory performance and social anxiety scores, using Pearson's correlation. The significance level for all analyses was set at p < 0.05. All statistical analyses were conducted using Rstudio [33].

General memory effects
There was no significant difference in average memory performance between the lab and web  1).
When directly comparing all 24 male participants from both samples with 24 randomly selected female participants from both samples (matched in age), we did not find any sexrelated differences in memory recognition, F(1, 90) = 0.39, p = 0.53, nor interactions of Gender and Trustworthiness, F(1, 90) = 0.11, p = 0.74. Table 1 summarizes participants' memory performance for trustworthy and untrustworthy faces in the lab and the web sample. When directly comparing the samples, for Pr, a main effect of Trustworthiness, F(1, 134) = 45.70, p < 0.001, Z 2 p ¼ 0:25, indicated higher memory discrimination for untrustworthy, compared to trustworthy faces, irrespective of Sample (see Fig 1). Moreover, no Sample, F(1, 134) = 1.04, p = 0.31, Z 2 p ¼ 0:007, or Sample x Trustworthiness effect, F < 1, were observed, suggesting that Sample did not have any specific effects on Pr (see Fig 1). For Br, results revealed a main effect of Trustworthiness, F(1, 134) = 22.00, p < 0.001,

Social anxiety and memory
Data on social anxiety were collected from participants in the web sample with the LSAS (M socialphobia = 53.81, SD socialphobia = 25.75, M avoidance = 26.86, SD avoidance = 13.92, M fear = 26.95, SD fear = 13.92). Correlational analysis did not reveal any significant relationship between the social anxiety scores and the Pr index either for trustworthy faces, r(102) = −0.04, p = 0.67, or for untrustworthy faces, r(102) = −0.08, p = 0.44. Similarly, no significant correlation was observed between the social anxiety scores and the Br index for any of the trustworthiness categories (−0.006 < rs < −0.02, ps > 0.83). Moreover, social anxiety scores did not correlate with participants' overall confidence for trustworthy, r(102) = 0.05, p = 0.62, or for untrustworthy faces, r(102) = −0.04, p = 0.67. When considered separately, there were no significant correlations between the social anxiety scores and hits for any of the trustworthiness categories (−0.11 < rs < 0.04, ps > 0.28) (the same applies for false alarms, 0.08 < rs < 0.14, ps > 0.18). Taken together, memory performance accuracy was comparable between the lab and web sample and showed comparable distributions, suggesting no effect of context on memory recognition memory. In addition, the trustworthy memory effect (i.e., enhanced memory performance for untrustworthy than for trustworthy faces) was similarly observed across samples. Significant differences only emerged for the Br index, suggesting that some disparities exist between both environmental settings.

Discussion
The aim of the present study was to investigate the similarities between lab-and web-based settings in an episodic memory paradigm using trustworthy and untrustworthy faces. Our results showed that episodic memory performance was comparable (i.e., accuracy, distribution, confidence) across samples, suggesting that overall memory was not affected by the settings, in which the task took place.
We also observed a memory-enhancing effect for untrustworthy faces consistent with many previous studies [20-23]. This memory advantage may indicate that untrustworthy faces are highly relevant stimuli for organizing social behavior [34] because they might signal potentially dangerous or harmful encounters [24, 34]. As hypothesized, this trustworthy effect (i.e., enhanced memory for untrustworthy faces) typically found in a laboratory context was also replicated in the supposedly "uncontrolled" online environment suggesting that this effect is highly robust. Given the striking similarity in memory accuracy, our data, therefore, suggest that data quality was not affected when moving from laboratory to web-based testing, which is in line with several previous studies testing other psychological processes [14, 15, 17, 18, 35-39]. Our study further extends the validity of online experimental procedures to memory paradigms (e.g., [16,[40][41][42][43]).
Although memory performance accuracy was similar across settings, participants in the web sample showed a stronger conservative response bias for untrustworthy faces compared to trustworthy faces. The response bias has been described as the decision rule an individual is using when faced with uncertainty (i.e., on recognition memory tasks) [44]. It is theoretically independent of discriminability [31]. A shift in the response criterion has been previously observed when memory judgments become more difficult (e.g., by delay, see [45]) or when participants are under stress or threat (e.g., [46]). It is, at this point, speculative but the latter factor may have caused a change in response criterion (towards conservative) in the web sample given that the web-based study was conducted at a familiar non-stressed home, in which participants may show more cautious or controlled behavior than in the unfamiliar (labbased), likely more stressful, environment. To address this response bias effect, however, future work should explore this possibility (i.e., under threat or stress conditions).
It is worth mentioning that both samples consisted predominantly of female participants which might have relevant implications in line with the evidence of gender differences in emotion processing and recognition (for review see [47]). Some previous studies observed that female compared to male participants are more reactive to emotional and stressful events as indicated by larger electrodermal activity and subjective ratings [48,49]. However, experimental studies of sex differences in facial emotion recognition paradigms have reported contradictory findings (for sex-differentiated findings see [49][50][51][52][53]; but also see [54,55]). Our exploratory analysis with a subsample of matched female and male participants, however, did not reveal any sex-related differences in recognition memory performance. Web-based experimental procedures provide an opportunity to shed further light on the disparity within the literature regarding biological sex differences (i.e., in a gender-matched sample directly testing for sex differences in memory for faces differing in trustworthiness) due to the facilitated recruitment of large and diverse samples of participants (e.g., in terms of gender) while keeping organizational issues and costs low (e.g., [5][6][7]).
Taking this advantage into consideration we were able to collect social anxiety scores of participants in the online experiment without much effort. This enabled us to explore the influence of individual differences in social anxiety on memory, which has been found in previous research to be enhanced, impaired or unaffected for facial expressions [19,25]. In the present study, however, we found no indication of a significant relationship between social phobia scores and memory performance for facial expressions varying in trustworthiness which indicates no memory bias (e.g., [19]; c.f., [56] for role of social anxiety on trustworthiness judgments). It should be noted, however, that the social anxiety scores were obtained in the context of the Covid-19 pandemic, which has caused social withdrawal due to reasons other than social anxiety (i.e., social distancing, fear of infection). An emerging literature investigating the impact of social isolation and loneliness during the Covid-19 pandemic has shown increased depression and (social) anxiety symptoms from before the pandemic in samples of young adults [57][58][59]. Some evidence from our study may also point in that direction since social anxiety scores were higher compared to other representative samples (for American and British student sample see [60,61]). Therefore, even when participants were instructed to fill in the questionnaire given usual habitual conditions it is not clear how individual responses were biased or affected by the pandemic, or possibly interfering with participants' adherence to Covid-19 safety measures.
Web-based experimental procedures, however, provide an opportunity to accelerate, and even proceed (i.e., in the context of the Covid-19 pandemic) with empirical research and might lead the way in promoting transparency and reproducibility (i.e., access to the code alone would be adequate to completely replicate an experiment) in behavioral research [14]. Importantly, online testing might also be valuable for psychophysiological research [62,63], particularly considering the current technological advantages. For instance, recent studies have shown that heart rate variability can be measured with smartphones (i.e., via video plethysmography by placing a finger on the camera lens of a smartphone [62]) or eye movements using webcams [63]. These measures have been only implemented in laboratory studies testing episodic memory for untrustworthy faces [23, 64], so far. Thus, the combination of behavioral and physiological measures in web-based and/or ambulatory settings may be a promising venue to investigate psychological processes in-situ (and in times of pandemic).

Conclusion
This study tested the validity of a web-based episodic memory paradigm that included trustworthy and untrustworthy facial stimuli. We compared memory performance in a supervised laboratory setting with an unsupervised web setting and observed comparable memory effects. In both studies, we further found that untrustworthy faces were better remembered than trustworthy faces (replicating prior lab studies). Altogether, our findings suggest that online testing could be a promising tool for scientific research.